fix(zarr-metadata): model stored metadata more closely#3962
Merged
Conversation
Zarr V2 uses a separate JSON document named `.zattrs` for the attributes of an array or group. This package was inconsistent about how it modelled this fact. The array metadata document type modelled array fields (`shape`, `dtype`, etc), which would be stored in `.zarray`, AND the `attributes` field, which would be stored in `.zattrs`. Thus the array metadata model matched the representation of an array that a program might use, rather than the stored layout. But the group metadata type didn't follow this pattern -- it has no `attributes` field. This PR addresses that inconsistency by adding an `attributes` field to `GroupMetadataV2`. That field is not required. To model the stored representation of V2 data, this PR adds 3 new types: `ZArrayMetadata`, `ZGroupMetadata`, and `ZAttrsMetadata`, that closely model the contents of the `.zarray`, `.zgroup`, and `.zattrs` documents, respectively. This change makes the V2 consolidated metadata type more accurate, as consolidated metadata for Zarr V2 is comprised of inlined metadata documents.
This reverts commit 629e565.
…Metadata at top level The on-disk file types added in 8b7af90 were importable from the v2 submodule but not from the package root. Add them to the top-level __init__.py so consumers can import them as `zarr_metadata.ZArrayMetadata` etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3962 +/- ##
=======================================
Coverage 93.28% 93.28%
=======================================
Files 87 87
Lines 11745 11745
=======================================
Hits 10956 10956
Misses 789 789 🚀 New features to boost your workflow:
|
Contributor
Author
|
cc @chuckwondo |
ilan-gold
reviewed
May 12, 2026
ilan-gold
approved these changes
May 12, 2026
Closed
d-v-b
added a commit
to d-v-b/zarr-python
that referenced
this pull request
May 14, 2026
uv.lock was removed in zarr-developers#3962 as unused. The mypy-via-hatch change in this branch makes it load-bearing again: it is the single source of truth that keeps the `dev` hatch environment (and therefore mypy's results) consistent across developer machines and CI. Restore it, regenerated against the current pyproject.toml.
d-v-b
added a commit
that referenced
this pull request
May 15, 2026
* chore: ignore docs/superpowers/ scratch directory
* chore: pin python 3.12 on hatch dev env
* chore: run mypy from hatch dev env, drop mirrors-mypy hook
Replace the pre-commit/mirrors-mypy hook (which maintained its own
duplicate dep list) with a `repo: local` hook that runs
`hatch run dev:mypy`. The dev hatch env's `dev` group (resolved via
uv.lock) becomes the single source of truth for mypy's dependency set.
This also unpins numpy from the type-check environment (it was
hard-pinned to `numpy==2.1` in the old hook); type fixes that follow
keep mypy clean against current numpy stubs:
- relax NDArrayLike.reshape/all signatures so np.ndarray
structurally satisfies the protocol
- widen AsyncGroup.require_array's `dtype` to include None
- add narrowly-scoped `# type: ignore` comments with explanatory
notes where numpy 2.x stubs are too strict against runtime-valid
calls (datetime64 unit f-strings, 'generic' unit sentinel,
newbyteorder subclass identity, ZDTypeLike None handling)
- drop stale `# type: ignore` comments that are no longer needed
* ci: install hatch in lint workflow so mypy hook can run
* docs: changelog for mypy-in-dev-env change
* refactor: resolve None dtype at create() boundary
`create()` accepts `dtype=None` (legacy v2 behavior: an unspecified
dtype defaults to float64). Previously this `None` was forwarded
untyped into `_create`, which doesn't accept `None` — it only worked
because `parse_dtype(None)` -> `np.dtype(None)` happens to resolve to
float64. That required a `cast()` to silence mypy.
Resolve `None` to `"float64"` explicitly in `create()` before
forwarding, so the value passed to `_create` is a real dtype and the
cast is no longer needed. No behavior change.
* refactor: give NDArrayLike.reshape/all precise signatures
The initial fix for numpy-stub conformance widened the NDArrayLike
protocol's `reshape` and `all` to `(*args: Any, **kwargs: Any) -> Any`,
which erased type information for every consumer of the protocol.
Replace with precise signatures that np.ndarray still satisfies
structurally:
- `reshape(shape: tuple[int, ...], /, *, order=..., copy=...)
-> NDArrayLike` — the `Literal[-1]` form was the only thing
blocking a precise signature (it straddles numpy's arity-split
overloads); it is unused on protocol-typed values, so drop it.
`NDBuffer.reshape` keeps its public `-1` support by normalizing
`-1` to `(-1,)` before forwarding.
- `all(self) -> np.bool_` — the sole caller wraps the result in
`bool(...)`, and no-arg is all we use.
* chore: remove gitignore for claude docs
* chore: restore uv.lock
uv.lock was removed in #3962 as unused. The mypy-via-hatch change in
this branch makes it load-bearing again: it is the single source of
truth that keeps the `dev` hatch environment (and therefore mypy's
results) consistent across developer machines and CI. Restore it,
regenerated against the current pyproject.toml.
* docs: rename changelog entry to PR #3972
* ci: skip mypy hook on pre-commit.ci
The mypy hook is now `language: system` and shells out to
`hatch run dev:mypy`, which needs the project's hatch dev environment.
pre-commit.ci's hosted runners don't have it, so the hook can only
fail there. Add it to `ci.skip`; mypy is still covered by the Lint
GitHub Actions workflow (which installs hatch) and by local prek runs.
* refactor: apply review nitpicks from PR #3972
- Inline the float64 dtype default into the `_create` call instead of
reassigning the `dtype` variable.
- Move the numpy 2.x stub explanation onto its own line above the code
so `# type: ignore` comment lines stay short.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: run mypy via `uv run` so the lockfile is actually honored
`hatch run dev:mypy` does not consume `uv.lock` — hatch has no lockfile
support and re-resolves the `dev` dependency group from scratch each time
it builds the environment. This defeated the PR's goal of a reproducible
type-checking environment: contributors with stale or differently-resolved
hatch `dev` envs saw different mypy results (e.g. errors from an older
`tomlkit` whose `TOMLDocument.__getitem__` was typed `Item | Container`
rather than `Any`).
Switch the mypy pre-commit hook and the Lint workflow to `uv run --frozen
mypy`. `uv` does sync from `uv.lock`, so the committed lockfile becomes the
real single source of truth for mypy's dependency set, identical for every
contributor and for CI.
- .pre-commit-config.yaml: hook entry `hatch run dev:mypy` -> `uv run --frozen mypy`
- .github/workflows/lint.yml: install `uv` instead of `hatch`
- pyproject.toml / changes: update wording to match
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Apply suggestion from @maxrjones
Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
* fix: test dtype is None exactly, not via falsy collapse
`dtype or "float64"` substitutes the default for any falsy input —
empty string, 0, empty Mapping — not just None. Those wouldn't pass
ZDTypeLike validation anyway, but the failure mode was "silent
substitution to float64" instead of "raise on invalid input".
Use an exact `is None` check expressed as a conditional expression.
* chore: add .python-version pinning default to 3.12
uv reads `.python-version` to decide which interpreter to use for
`uv venv` / `uv sync` / `uv run`. With the mypy hook now running as
`uv run --frozen mypy`, pinning the interpreter here keeps the dev
env consistent across developer machines — matching the existing
`[tool.mypy].python_version = "3.12"` and `requires-python = ">=3.12"`
declarations.
`.python-version` is not consumed by hatch (its envs declare their
own Python via `[tool.hatch.envs.*].python`), so the test matrix
(py3.12/3.13/3.14) is unaffected.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Zarr V2 uses a separate JSON document named
.zattrsfor the attributes of an array or group.This package was inconsistent about how it modelled this fact. The array metadata document type modelled
array fields (
shape,dtype, etc), which would be stored in.zarray, AND theattributesfield,which would be stored in
.zattrs. Thus the array metadata model matched the representationof an array that a program might use, rather than the stored layout. But the group metadata type didn't
follow this pattern -- it has no
attributesfield.This PR addresses that inconsistency by adding an
attributesfield toGroupMetadataV2. That field isnot required. To model the stored representation of V2 data, this PR adds 3 new types:
ZArrayMetadata,ZGroupMetadata, andZAttrsMetadata, that closely model the contents of the.zarray,.zgroup, and.zattrsdocuments, respectively.This change makes the V2 consolidated metadata type more accurate, as consolidated metadata for Zarr V2
is comprised of inlined metadata documents.